Pushing "Underfitting" to the Limit: Learning in Bidimensional Text Categorization

نویسندگان

  • Giorgio Maria Di Nunzio
  • Alessandro Micarelli
چکیده

The analysis of two heuristic supervised learning algorithms for text categorization in two dimensions is presented here. The graphical properties of the bidimensional representation allows one to tailor a geometrical heuristic approach in order to exploit the peculiar distribution of text documents. In particular, we want to investigate the theoretical linear cost of the algorithms and try to push the performance to the limit. The experiments on Reuters-21578 standard benchmark confirm that this approach is an alternative to the standard linear learning models, such as support vector machines, for text classification. Moreover, due to the fast training session, this approach may also be considered as a support for text categorization systems for fast graphical investigations of large collections of documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Classifying Business Types from Twitter Posts Using Active Learning

Today, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each ...

متن کامل

Categorization of Components of Learning Outcomes for University Students (Grounded Theory)

Introduction: Although learning outcomes have a theoretical basis, the components of learning outcomes in different studies have been considered differently. The purpose of this article is to explain the concept and determine the constituent components of learning outcomes for the undergraduate Students. Methods: This study is grounded Theory. Researchers to study with more than 150 sources inc...

متن کامل

Automatic Text Categorization: Case Study

Text Categorization is a process of classifying documents with regard to a group of one or more existent categories [1] according to themes or concepts present in their contents. The most common application of it is in Information Retrieval Systems (IRS) to document indexing [2]. The organization of text in categories allow the user to limit the target of a search submitted to IRS, to explore t...

متن کامل

Learning with Unlabeled Data for Text Categorization Using a Bootstrapping and a Feature Projection Technique

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004